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EXECUTIVE  SUMMARY 


To  improve  the  computational  efficiency  in  developing  an  editing  and  imputation  (E/I)  error 
localization  module  for  the  U.S.  Census  of  Agriculture  and  other  large  surveys  conducted  by 
NASS,  we  addressed  the  methodological  issue  of  variable  elimination  by  equality  edit  in  linear 
editing.  This  will  simplify  the  linear  edit  system  and  reduce  the  magnitude  of  computation. 

Though,  in  linear  programming,  the  Fourier  elimination  method  for  linear  inequalities  has 
long  been  used,  the  role  of  equality  edits  in  linear  editing  has  not  been  fully  explored.  All  the 
automatic  computer  E/I  systems  for  numerical  data  have  generally  treated  equality  edits  as  a 
special  case  of  inequality  edits.  A  common  practice  has  been  to  represent  an  equality  edit  by  two 
inequalities  of  opposite  direction.  However,  an  equality  edit  defines  a  more  informative 
relationship  than  an  inequality  edit.  Therefore,  the  contribution  of  an  equality  edit  to  an  editing 
problem  should  be  more  than  that  of  an  inequality  edit. 

Our  research  results,  extending  some  of  Fellegi  and  Holt  (1976)  results  on  linear  edits, 
establish  the  methodology  of  variable  elimination  by  equality  edit  in  linear  editing,  which  leads 
to  a  simplified  linear  editing  problem  in  reduced  dimension. 

The  methodological  establishment  of  this  paper  can  be  particularly  useful  for  the  U.S.  Census 
of  Agriculture  editing  and  imputation,  for  which  a  considerable  number  of  the  linear  edits  are 
equality  ones.  It  is  expected  that  the  implementation  of  this  methodology,  in  conjunction  with 
other  computational  improvements,  may  enable  Fellegi-Holt  methodology  to  be  implemented 
into  the  editing  systems  for  future  censuses  and  sample  surveys  with  improved  efficiency  and 
accuracy. 


RECOMMENDATIONS 

The  Census  of  Agriculture  requires  a  very  extensive  editing  system,  featuring  a  large  number 
of  equality  edits.  As  a  result,  the  variable  elimination  methodology  provided  by  this  paper  can  be 
especially  useful  in  the  context  of  researching  the  possible  incorporation  of  error-localization  into 
the  editing  system  for  the  2007  Census  of  Agriculture.  The  following  steps  for  further  research 
are  recommended: 

1)  Develop  an  automated  approach  for  implementing  the  proposed  variable  elimination 
approach  from  an  initial  set  of  linear  edits. 

2)  Calculate  the  computational  gains  from  implementing  this  methodology  on  the  linear  edits 
prepared  specifically  for  the  Census  of  Agriculture. 
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ELIMINATION  IN  LINEAR  EDITING  AND  ERROR  LOCALIZATION 


Stanley  S.  Weng 


This  paper  presents  some  theoretical  findings  from  our  recent  methodological  research  addressing  the  issue 
of  variable  elimination  by  equality  edit  in  linear  editing.  The  research  was  motivated  by  seeking  improvement 
of  computational  efficiency  for  error  localization,  when  implementing  an  error  localization  module  for  the 
editing  and  imputation  of  NASS’  large  surveys.  Our  results,  extending  some  of  Fellegi  and  Holt  (1976) 
results  on  linear  edits,  establish  the  method  of  elimination  by  equality  edit  in  linear  editing,  which  leads  to  a 
simplified  linear  editing  problem  in  reduced  dimension. 

The  methodological  establishment  of  this  paper  can  be  particularly  useful  as  applied  to  the  U.S.  Census  of 
Agriculture  editing  and  imputation,  for  which  a  considerable  number  of  the  linear  edits  are  equality  ones.  It  is 
expected  that  the  implementation  of  this  methodology,  in  conjunction  with  other  computational 
improvements,  may  enable  Fellegi-Holt  methodology  to  be  implemented  into  the  editing  systems  for  future 
censuses  and  sample  surveys  with  improved  efficiency  and  accuracy. 

KEY  WORDS:  Automatic  editing  and  imputation;  Fellegi-Holt  methodology;  Implied  edit;  Fourier 
elimination;  Elimination  by  equality  edit. 


1.  INTRODUCTION 

For  the  error  localization  (EL)  problem  in 
automatic  data  editing  and  imputation  (E/I) 
with  linear  edits  under  the  Fellegi-Holt  (F-H) 
methodology  (Fellegi  and  Holt,  1976),  the 
linear  programming  approach  provides  proper 
methods  for  solution  (Rubin,  1975;  Sande, 
1978;  Schiopu-Kratina  and  Kovar,  1989). 
However,  in  practice,  the  computational 
efficiency  of  error  localization  has  been  an 
issue  (Winkler,  1999;  Winkler  and  Chen, 
2002).  Various  efforts  have  been  made  to 
improve  the  efficiency,  including  using  an 
algorithm  other  than  Chemikova’s  for  linear 
programming,  e.g.,  one  based  on  Duffin’s 
(1974)  analysis  of  a  system  of  linear 
inequalities  (Houbiers,  1999);  a  tree-search 
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approach  instead  of  a  Chemikova’s  algorithm¬ 
like  process  (Quere,  2000;  Quere  and  De 
Waal,  2000);  and  even  an  entirely  different 
approach,  while  still  in  the  spirit  of  F-H 
(Bankier,  2000;  Bankier,  et  ah,  2000). 

One  other  consideration  is  to  simplify  the 
linear  edit  system  by  using  its  special  structure 
and  features,  to  reduce  the  dimension  of  the 
system  and  thus  the  magnitude  of  computation 
for  error  localization. 

Edits  used  in  economic  surveys  and 
censuses,  like  those  created  by  NASS/USDA 
for  the  U.S.  Census  of  Agriculture,  are 
primarily  linear.  They  also  contain  a 
considerable  number  of  equality  edits,  for 
example  balance  edits,  in  which  an  aggregate 
variable  is  equal  to  the  sum  of  its  component 
variables. 

In  the  presence  of  equality  edits  in  a  linear 
edit  system,  it  seems  preferable  to  use  the 
equality  edits  to  eliminate  fields  (variables), 
leading  to  a  simplified  system  in  reduced 
dimension.  However,  until  now,  none  of  the 
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automatic  computer  E/I  systems  for  numerical 
data  have  distinguished  conceptually  between 
equality  and  inequality  edits.  Equality  edits 
have  generally  been  treated  as  a  special  case  of 
inequality  edits.  Some  algorithms  adopted  the 
representation  of  an  equality  edit  by  two 
inequalities  of  opposite  direction.  Such 
handling  seems  to  ignore  the  more  informative 
specification  of  an  equality  edit.  The  equality 
form  defines  a  more  restrictive  relationship 
than  that  of  an  inequality.  In  linear  theory,  an 
equality  represents  a  lower  dimension 
hyperplane  in  the  data  linear  space.  The 
contribution  of  an  equality  edit  to  an  editing 
problem  should  be  more  than  that  of  an 
inequality  edit. 

From  the  point  of  view  of  F-H 
methodology,  there  is  an  important  distinction 
between  equality  and  inequality  edits  in  their 
generation  of  implied  edits.  This  paper 
identifies  such  a  distinction  and  establishes  a 
method  of  using  equality  edits  to  eliminate 
fields  and  reach  an  equivalent  linear  edit 
system,  for  which  all  the  inequality  edits  form 
a  linear  edit  system  of  lower  dimension.  The 
original  linear  editing  problem  can  be  solved 
by  first  solving  the  problem  with  respect  to 
this  reduced  system,  and  then  determining  the 
remaining  fields  by  the  specification  of  the 
equality  edits. 

Benefits  in  computational  efficiency  from 
this  methodology  can  be  significant.  The 
magnitude  of  the  editing  problem  is  reduced 
through  elimination,  and  the  program  needs 
only  to  handle  inequality  edits. 

The  outline  of  this  report  is  as  follows. 
Section  2  describes  the  basic  setting  and 
concepts  of  linear  editing.  Section  3  reviews 
some  basic  concepts  and  results  of  the  F-H 
theory  in  the  context  of  linear  editing,  that  are 


related  to  the  topic  of  this  paper.  Section  4  is  a 
brief  review  of  some  mathematical  concepts  of 
Fourier  elimination.  Section  5  presents  our 
theoretical  results  on  the  methodology  of 
elimination  by  equality  edit.  Section  6  gets 
back  to  the  main  editing  problem,  error 
localization,  which  motivated  this  research 
and  now  can  be  solved  in  reduced  scale  with 
improved  efficiency.  Section  7  briefly 
discusses  the  implementation  issue.  Section  8 
gives  our  recommendations.  The  technical 
Appendix  contains  the  proof  of  the  theoretical 
results  of  this  paper. 

2.  LINEAR  EDITING 

The  editing  problem  of  numerical  data 
from  a  survey/census  is  generally  defined  by  a 
set  of  linear  edits  in  the  following  form: 

e,:  anx,  +  a,2x2+...+alnx„  <  b, 

i'  =  l,2,...,m  (la) 

with  positivity  constraints  for  the  variables 


x j  >  0,  j  =  1,2, ...,n  (lb) 

Here  in  (la)  the  inequality  sign  may  represent 
either  inequality  or  equality.  In  matrix 
notation,  the  above  linear  edit  system  is 
written  as 

A  x  <  b  (2a) 

and 

x  >  0  (2b) 

where  A  (m  X  n)  is  the  edit  coefficient 
matrix  of  (la),  b  ( m  X  1  )  is  the  nght-hand- 


? 


side  vector  of  (la),  andx  =  (jc,  ,  x2 , xn  )T 
is  the  data  vector  (where  T  denotes  transpose 
of  a  vector).  Data  editing  so  specified  is  called 
linear  editing.  Additional  constraints  may  be 
added  to  the  above  basic  setting  to  define 
various  linear  editing  problems,  for  example 
error  localization,  that  will  be  described  in 
Section  6. 

A  data  record  is  a  passing  record  with 
respect  to  a  linear  edit  system  if  the  record 
satisfies  all  edits  in  the  system.  Otherwise,  the 
record  is  a  failed  one.  All  data  points  that 
satisfy  the  linear  edit  system  form  the  feasible 
area  of  the  system.  A  passing  record  is  also 
called  feasible,  and  a  failed  record  infeasible. 
A  linear  edit  system  is  completely  described 
by  its  feasible  area.  Two  linear  edit  systems 
are  considered  equivalent  if  they  have 
identical  feasible  areas.  Geometrically,  the 
feasible  area  of  a  linear  system  is  a  polyhedron 
in  the  data  space. 

We  are  actually  in  the  setting  of  linear 
programming  (Gass,  1985;  Kotz  &  Johnson 
(Ed),  1985;  Luenberger,  1984).  Linear  editing 
problems,  such  as  error  localization,  are 
generally  related  to  solutions  of  a  linear 
program.  A  linear  program  can  be  solved  by 
finding  the  set  of  all  extremal  points  of  its 
feasible  area.  Chemikova’s  algorithm 
(Chemikova,  1964,  1965)  is  used  to  find  all 
extremal  points  of  a  linear  system  of 
nonnegative  variables. 

3.  F-H  THEOREM  ON  LINEAR  EDITS 

Fellegi  and  Holt  (1976)  established  the 
fundamental  theory  of  automatic  editing  and 
imputation  in  the  following  criteria,  widely 
referred  to  as  the  F-H  principles: 

(1)  The  data  in  each  record  should  be  made  to 


satisfy  all  edits  by  changing  the  fewest 
possible  items  of  data  (fields). 

(2)  Imputation  rules  should  be  derived  from 
the  corresponding  edit  rules  without  explicit 
specification. 

(3)  When  imputation  takes  place,  it  should 
maintain,  as  far  as  possible,  the  frequency 
structure  of  the  data  file. 

For  a  failed  record,  identifying  the  fewest 
possible  fields  that  may  be  changed  to  make 
the  resulting  record  satisfy  all  edits  is  the  error 
localization  problem. 

To  solve  the  error  localization  problem,  F- 
H  showed  that  both  explicit  (the  original) 
edits,  as  specified  by  subject-matter  experts, 
and  implied  edits  are  needed.  An  implied  edit 
is  one  that  is  logically  implied  by  a  set  of 
explicit  edits.  An  implied  edit  is  said  to  be  an 
essentially  new  edit  if  it  does  not  involve  all 
the  fields  (variables)  explicitly  involved  in  the 
edits  that  generated  it.  A  field  that  is 
eliminated  in  generating  an  essentially  new 
implied  edit  is  called  a  generating  field  of  the 
implied  edit.  A  set  of  edits  together  with  all 
essentially  new  implied  edits  that  can  be 
generated  from  the  set  of  edits,  forms  a 
complete  set  of  edits.  The  concept  of  a 
complete  set  of  edits  is  crucial  in  F-H  theory, 
which  underlies  their  main  theorem. 

We  focus  on  linear  editing.  For  linear 
edits,  the  generation  of  essentially  new  edits 
and  the  derivation  of  a  complete  set  of  edits 
take  an  explicit  form,  as  given  by  Theorem  3 
of  Fellegi  and  Holt  (1976).  The  following  is  a 
restatement  of  the  theorem. 

Theorem  (F-H,  1976).  An  essentially  new 
implied  edite,  is  generated  from  edits  er  and 
es ,  as  in  (la),  using  field  j  as  a  generating 
field,  if  and  only  if  a  rj  and  a  sj  are  both  nonzero 
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and  of  opposite  sign.  The  coefficients  of  the 
new  edit,  atk  ,  are  given  by 

Q  lk  —  <2  skQ.  rj  Cl  rk  (2  sj,  k  —  2, ... ,  K  , 


5y  >-60 
-5y  >-30 
y  >  -100 
-ly  >-50 


(3*E1  +  2*E3) 
(3*E2  +  E3) 
(3*E1  +  2*E4) 
(3*E2  +  E4). 


where  r  and  s  are  so  chosen  that  a  rj  >0  and 

a  SJ  <  0  .  Repeated  application  of  the  above 

procedure  will  derive  all  essentially  new 
implied  edits. 


We  may  continue  to  generate  implied 
edits,  though  maybe  redundant,  from  the 
above  generated  implied  edits  and  the  original 
edits. 


The  theorem  simply  states  that  from  two 
linear  inequalities  where  the  inequality  signs 
are  in  the  same  direction,  a  variable  can  be 
eliminated  by  taking  their  linear  combination 
if  and  only  if  the  variable  has  coefficients  in 
the  two  inequalities  which  are  of  the  opposite 
sign.  The  essence  of  generating  an  essentially 
new  implied  edit  is  elimination  of  a  field. 

Example  1  (Generation  of  essentially  new 
implied  edits).  Consider  the  following  set  of 


linear  edits: 

El: 

2x  +  y  >  20  , 

E2: 

x-2 y  >10, 

E3: 

-3 x  +  y  >  -60  , 

E4 

-3jc  -  y  >  -80  . 

Using  y  as  the  generating  field,  the  following 
essentially  new  edits  may  be  generated: 

5  Jt  >50  (2*E1+E2) 

-5*  >-110  (2*E3  +  E2) 

-  ;t  >  -60  (E1+E4) 

-  6*  >  -140  (E3+E4) 


The  elimination  operations  are  indicated  in  the 
parentheses.  And,  using*  as  the  generating 
field,  the  following  essentially  new  implied 
edits  may  be  generated: 


4.  FOURIER  ELIMINATION 

In  linear  theory,  the  method  used  in  F-H 
Theorem  3  to  generate  essentially  new  implied 
edits  is  called  Fourier  elimination  (Duffin, 
1974;  Fourier,  1826;  Schrijver,  1986).  This 
approach  was  proposed  by  Fourier  to  solve 
linear  programming  problems  by  elimination 
of  variables.  A  variable,  say,  x  h ,  can  be 
eliminated  by  taking  positive  combinations  of 
two  inequalities  which  have  opposite  signs  in 
the  coefficient  of  xh  .  By  adding  suitable 
combinations  of  all  possible  pairs  of 
inequalities  with  a  positive  and  a  negative 
coefficient  oixh  ,  and  subsequently  adding  all 

inequalities  that  did  not  contain  x h  in  the  first 
place,  one  gets  a  new  system  of  inequalities 
which  does  not  contain  variable  x h  .  This 
process  can  continue  in  successive  elimination 
of  other  variables. 

In  a  Fourier  elimination  process,  the 
number  of  inequalities  can  grow  excessively. 
Moreover,  by  taking  all  possible  linear 
combinations  of  the  original  inequalities 
during  the  elimination  process,  it  could  easily 
occur  that  some  inequalities  become 
redundant.  That  is,  an  inequality  can  be 
written  as  a  positive  linear  combination  of 
some  of  the  other  inequalities.  Duffin  (1974), 
in  his  method  of  analyzing  systems  of  linear 
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inequalities,  proposed  a  “refined  elimination” 
rule  which  deletes  any  inequality  which  has 
been  generated  by  adding  t  +  2  or  more  of  the 
original  inequalities,  when  t  variables  have 
been  eliminated.  Houbiers  (1999)  applied 
Duffm’s  method  to  error  localization. 

Fourier’s  original  problem  of  interest  was 
whether  a  feasible  solution  to  a  specified  set  of 
linear  inequalities  exists.  This  can  be  restated, 
in  the  terminology  of  modem  automatic  data 
editing,  as  whether  a  set  of  fields  can  be 
imputed  in  such  a  way  that  a  specified  set  of 
linear  edits  can  be  satisfied.  Fourier’s  method 
of  successive  elimination  has  fostered  modem 
automatic  data  editing,  as  generalized  in  the  F- 
H  methodology. 

5.  ELIMINATION  BY  EQUALITY  EDIT 

In  addressing  linear  editing  problems,  it 
seems  that  the  role  of  equality  edits  has  not 
been  fully  explored.  Equality  edits  have 
generally  been  treated  as  a  special  case  of 
inequality  edits,  without  using  the  defining 
feature,  the  deterministic  aspect,  of  an  equality 
edit.  Actually,  from  the  implied  edit  point  of 
view,  there  is  an  important  distinction  between 
equality  edits  and  inequality  edits  in  their 
generation  of  implied  edits,  as  shown  by  two 
lemmas  to  be  introduced  below. 

Before  stating  the  lemmas,  we  introduce 
the  concept  of  equivalent  edits.  Two  sets  of 
edits  are  equivalent,  if  they  imply  each  other, 
that  is,  each  edit  in  one  set  is  implied  by  (some 
edits  of)  the  other  set.  In  the  linear  edit 
context,  two  sets  of  linear  edits  are  equivalent 
if  their  feasible  area  (thus,  the  set  of  extremal 
points)  are  identical.  Two  sets  of  equivalent 
linear  edits  have  the  same  contribution  to  a 
linear  edit  system,  and  may  thus  replace  each 
other.  Editing  problems  with  respect  to  two 
equivalent  sets  of  edits  are  considered  the 


same. 

The  following  two  lemmas  extend  the 
statements  of  Fellegi  and  Holt  (1976) 

Theorem  3  in  the  situation  where  one  edit  is 
an  equality.  They  state  that,  in  such  situations, 
it  is  always  possible  to  generate  an  essentially 
new  implied  edit  when  a  common  field  is 
involved.  Furthermore,  the  original  inequality 
edit  can  be  replaced  by  the  essentially  new 
implied  edit  generated. 

Lemma  1.  An  essentially  new  implied  edit 
can  always  be  generated  from  edits  er  andes , 

where  es  is  an  equality  edit,  using  field  j  as  a 
generating  field,  provided  the  coefficients  of 
field  j  in  the  two  edits  are  both  nonzero. 

Lemma  2.  An  inequality  edit  er  can  be 
replaced  by  an  essentially  new  implied  edit 
et  generated  from  e r  and  an  equality  edit  es . 

Proof  of  Lemma  1  and  Lemma  2  are  given 
in  Appendix  of  this  report. 

The  above  lemmas  show  how  an  equality 
edit  can  be  used  to  simplify  a  linear  edit 
system.  Based  on  these  two  lemmas,  our  next 
two  theorems  show  that,  just  as  elimination  of 
free  variables  can  be  made  using  equalities  in 
the  linear  system,  so  can  elimination  of 
positively  constrained  variables  using  the 
equality  edits  present  in  the  linear  edit  system. 
The  theorems  are  stated  in  the  context  of 
linear  editing  through  the  F-H  concept  of 
implied  edit. 

Theorem  1  (Elimination  by  equality  edit). 
Suppose  a  linear  edit  system  contains  m 
inequality  edits  and  one  equality  edit,  with  n 
positivity  constraints  for  the  n  fields  involved. 
Then,  one  nonzero  field  of  the  equality  edit 
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can  be  eliminated  from  all  other  edits 
involving  that  field.  The  resulting  new  linear 
edit  system  contains  m  +  1  inequality  edits 
involving  n  -  1  fields,  with  n  -  1 
corresponding  positivity  constraints,  and  the 
original  equality  edit.  The  new  system  is 
equivalent  to  the  original  one.  The  extremal 
points  of  the  original  linear  system  can  thus  be 
obtained  by  first  obtaining  the  extremal  points 
in  the  n  —  1  fields  of  the  new  linear  system 
excluding  the  equality  edit,  and  then 
determining  the  remaining  field  by  the 
equality  edit. 

Proof  of  Theorem  1  is  given  in  Appendix. 
The  following  example  illustrates  the 
elimination  method,  as  stated  in  the  proof  of 
Theorem  1. 

Example  2.  Consider  the  following  set  of 
linear  edits: 

2 jc,  +  x2  +  Jt3  <  4  ,  (4) 

xx  +  2x2  +  3 jc3  <  5  , 

+  x2  +  2x3  =  3  , 

Xj  >0,j  =  1,2,3. 

We  use  the  equality  edit  to  eliminate  a 
variable,  say,  x3 ,  in  the  two  inequality  edits. 

Eliminating in  the  first  inequality  edit,  we 
have 

3je,  +  x2  <  5  . 

Eliminating  x 3  in  the  second  inequality  edit, 
we  have 

—  JC,  +  JC2  <  1  . 

And,  the  essentially  new  implied  edit 


generated  by  the  positivity  constraints,  >  0 
and  the  equality  edit: 

x]  +  x2  <  3  . 

In  the(jt, ,  x2 )  space,  solve  the  last  three 
inequalities,  and  there  are  three  extremal 
points: 

(^,0),  (0,1), and  (1,2). 

Now  calculate  x ,  using  the  equality  edit, 
s3  =  (3  —  s,  —  x2 )  /  2  ,  it  follows 

x3  =  y^,l,0  ,  respectively.  The  extremal 
points  of  the  original  system  thus  are 

(y'j  ,°>  ^3 )  >  (0,1,* ) .  and  (1,2,0)  ■ 

Theorem  1  may  be  extended  to  linear  edit 
systems  containing  multiple  equality  edits,  as 
follows. 

Theorem  2.  Suppose  a  linear  edit  system 
contains  m  inequality  edits  and  q  equality 

edits,  with  n  positivity  constraints  for 
the  n  fields  involved  (q  <n).  Assume  the  q 
equality  edits  are  of  full  rank.  Then,  a  new 
linear  edit  system,  which  is  equivalent  to  the 
original  one,  can  be  formed  through 
elimination  using  the  q  equality  edits.  The  new 

system  contains  m  +  q  inequality  edits 
involving/?  —  q  fields,  with/?  —  q 

corresponding  positivity  constraints,  and  the 
original  q  equality  edits.  The  extremal  points 

of  the  original  linear  system  can  thus  be 
obtained  by  first  obtaining  the  extremal  points 
in  the  n  —  q  fields  of  the  new  linear  system 
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excluding  the  <7  equality  edits,  and  then 
determining  the  remaining  q  fields  using 
theg  equality  edits. 

Proof  of  Theorem  2  is  provided  in 
Appendix.  The  elimination  process,  as 
described  in  the  proof  of  Theorem  2,  is 
illustrated  by  the  following  example. 


fields  ( s , ,  s  , ) ,  two  corresponding  positivity 

constraints  s ;  >  0,  j  =  1,2  ,  and  two  equality 

edits,  (5.1),  and  (5.4)  (or,  equivalently,  the  two 
original  equality  edits,  (5.1)  and  (5.2)). 

In  the  second  stage  of  elimination,  we  use 
equality  edit  (5.4)  to  eliminate  another  field, 
say,  s2  ,  in  the  inequality  edits.  With  (5.5): 


Example  3.  Consider  the  following  set  of 
linear  edits  ( m  =  1  and  q  =  2  ): 


3s, 

+  3s2  +  s3  =  3  , 

(5.1) 

2s, 

1  +  s2  +  2s3  —  4  , 

(5.2) 

4*. 

+  2*2  +X}  <3 Y2, 

(5.3) 

x,  >0J  =  1,2,3. 

Here,  for  a  convenient  setting  to  display  the 
elimination  process,  we  list  equality  edits 
above  inequality  edits. 

First,  use  equality  edit  (5.1)  to  eliminate  a 
field,  say,  s3 .  With  (5.2): 

4s,  +  5x2  =  2  .  (5.4) 

With  (5.3): 

s,  -  s2  <  .  (5.5) 

Also,  with  the  positivity  constraint  s3  >  0  : 

3s,  +3x2  <  3  ,  or 

s,  +  x2  <  1 .  (5.6) 

Now  the  new  edit  system,  resulting  from  the 
first  stage  of  elimination,  consists  of  two 
inequality  edits,  (5.5)  and  (5.6),  involving 


With  (5.6): 

x,  <  3  (5.8) 

And  with  the  positivity  constraints,  >  0  : 

4s,  <  2  ,  or, 

X,  <  ]/2  ■  (5.9) 

The  new  edit  system,  resulting  from  the 
second  stage  of  elimination,  consists  of  three 
inequality  edits,  (5.7),  (5.8)  and  (5.9), 
involving  only  field  s,  ,  the  positivity 

constraints,  >  0  ,  and  two  equality  edits  (5.1) 
and  (5.4)  (or,  (5.1)  and  (5.2)). 

Now  we  can  solve  the  reduced  linear 
system  ins,  ,  that  is,  (5,7),  (5.8)  and  (5.9), 

with  the  constraints,  >  0  .  We  find  two 
extremal  points  s,  =  0  and  .  Then,  using 

(5.4),  we  gets,  =  andO  respectively;  and 

Q  /  O  / 

using  (5.1),  s3  and respectively. 
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Thus,  the  extremal  points  of  the  original 
system  are 

(0,  )  and  (j/^  >0>  y^) . 

We  may  simultaneously  eliminate  two 
fields  using  the  two  equality  edits.  By  this,  we 
first  convert  the  equalities  to  the  canonical 
form  through  Gaussian  elimination  (see,  e.g., 
Luenberger  (1984)),  and  then  substitute  them 
into  the  inequalities.  (5.1)  and  (5.2)  can  be 
written  in  such  form  in  ( x , ,  x 2 )  as 

x,  +  x3  —  3  =  0,  (5.10) 

x2-y^x3+2=0.  (5.11) 

Substituting  (5.10)  and  (5.11)  into  the 
inequality  edit  (5.3)  to  eliminate  x ,  and  x 2  ,  it 
follows 


x 


3 


(5.12) 


and  into  x,  >  0  ,  it  follows 

-5/^x3+3>0;  (5.13) 


x3 ,  obtaining  the  two  extremal  points, 

3/  Q  / 

X 3  =  andx3  =  yc  .  Then,  use  (5.10)  and 

(5.11)  to  determine  the  remaining  fields, 

3  /  1  / 

x,  and  x2  :  forx3  ~  /2  '  x\  ~  /2  and 
x2  =  0  ;  and  forx3  =  ^5  ,x,  =  0  and 


6.  ERROR  LOCALIZATION 


The  error  localization  problem  is  stated  as: 
for  a  failed  record,  anticipating  the  F-H 
principles,  which  components  of  the  record 
must  be  changed  in  order  that,  with  as  few  as 
possible  changes,  the  record  can  be  made  to 
pass  the  edit  system? 

In  linear  editing,  the  linear  programming 
approach  to  solving  the  error  localization 
problem  (Sante,  1978;  Schiopu-Kratina  and 
Kovar,  1989)  is  briefly  described  as  follows. 


Letx  o  be  a  failed  record  with  respect  to  the 
linear  edit  system  (2a,  2b).  Let  Ax  be  the 
correction  vector  in  the  sense  that  x  0  +  Ax 
passes  all  the  edits  of  (2a,  2b),  that  is, 


and  intox2  >  0  ,  it  follows 


A(x0  +  Ax)  <  b  , 
x  0  +  Ax  >  0  . 


y^  x3  -  2  >  0  .  (5.14) 

Now  the  new  edit  system,  resulting  from  the 
elimination  of  ( x , ,  x  2 )  ,  contains  three 
inequality  edits  in  x3 ,  (5.12),  (5.13)  and 
(5.14),  with  the  positivity  constraint  x3  >  0  , 
and  the  two  equality  edits,  (5.10)  and  (5.1 1). 
We  first  solve  the  simplified  linear  system  in 


Since  x  0  is  known,  rewrite  the  above  system 
as 

A  Ax  <  b  -  Ax  0 ,  (6) 

Ax  >  -x0 . 

The  usual  technique  to  solve  (6)  is  to 
express  the  change  Ax  as  a  difference  between 
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the  positive  and  negative  changes: 


point  that  satisfies  (8c). 


Ax  =  u  -  v  , 

where  both  u  and  v  are  nonnegative  vectors 

and  their  inner  product  is  zero,  u  Tv  =  0  (i.e., 
for  any  field,  there  may  be  either  positive  or 
negative  change,  but  not  both). 

Denote  |  x|  + ,  called  cardinality ,  for  the 

number  of  strictly  positive  elements  of  a 
nonnegative  vector  X  .  The  problem  (6)  can  be 
stated  as:  Find  all  possible  correction  vectors 

( U  T,  V  T ) T  such  that  the  cardinality  of 

|  U  —  v|  is  a  minimum,  subject  to: 

A(u  -  v)  <  b  -  Ax0,  (7) 

u  -  V  >  -x0 , 
u,v  >0, 
u  Tv  =  0  . 

Problem'(7)  can  be  restated  with  respect  to  a 
linear  system  in  ( u ,  V )  E  R  2n ,  in  standard 
form,  with  suitable  matrices  A  j  andbj  ,  as: 


min| u  -  v|+ 

subject  to: 

(8a) 

(8b) 

(8c) 


(  n\ 

-  bj 

VVJ 

u,v  >0, 
u  Tv  =  0  . 


The  complementary  condition  (8c)  is  actually 
redundant  in  the  problem,  because  the 

minimum  of |  U  —  v|4  is  always  reached  at  a 


Rubin  (1975)  noticed  the  monotone 
property  with  Chemikova’s  algorithm  in 
processing  a  row,  that  the  cardinality  of  any 
new  column  generated  is  no  less  than  that  of 
its  generating  columns.  He  modified 
Chemikova’s  algorithm  to  solve  the  following 
cardinality  constrained  linear  program 
problem: 

max  d  rx 

subject  to 

Ax  <  b  ,  (9) 

x  >0  , 

|x|+<  Th 

where  X  andd  are  n  X  1  ,  A  is  m  X  n  ,  b  is 
m  X  1  ,  and  77  is  a  positive  integer  less  than 

m  in  { m ,  n } ,  by  directly  producing  the 
extremal  points  of  the  feasible  area 
G  =  {x|  Ax  <  b,x  >  0}  that  satisfy 

|  X  |  +  <  Tj ,  and  then  determining  the  optimal 

extremal  point.  As  Tanahashi  and  Luenberger 
(1971)  showed,  an  optimal  solution  to  (9)  can 
always  be  found  in  G  . 

Rubin’s  cardinality  constrained  linear 
program  has  been  adopted  as  a  standard 
formulation  of  the  linear  editing  error 
localization  problem,  e.g.,  GEIS  (Sande,  1978; 
Schiopu-Kratina  and  Kovar,  1989)  and 
CherryPi  (De  Waal,  1996). 

Houbiers  (1999)  applied  Duffin’s  method 
on  Fourier’s  analysis  of  linear  inequality 
systems  to  error  localization.  He  compared 
Duffin’s  method  with  Chemikova’s  algorithm 
-  two  similar  algorithms  with  different  control 
rules  for  excessive  growth  of  the  matrix,  and 
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showed  that  Duffin’s  method  is  expected  to  be 
more  efficient.  Quere  (2000)  developed  a  new 
algorithm  which  performs  Fourier  elimination 
in  a  tree  search  process,  instead  of  a 
Chemikova’s  algorithm-like  process,  to 
determine  all  optimal  solutions  to  the  error 
localization  problem  (see  also  Quere  and  De 
Waal,  2000). 

In  the  presence  of  equality  edits  in  the 
linear  edit  system,  by  the  elimination 
methodology  provided  in  last  section,  we  can 
solve  the  error  localization  problem  with 
respect  to  a  simplified  system  in  reduced 
dimension,  as  described  below. 

Through  elimination  by  the  equality  edits, 
the  linear  edit  system  is  restructured  into  the 
following  form  : 

L,:  A  ,x<1>  <  b(l) , 

x'1’  >0, 

and 


fails  L, ,  perform  error  localization  and 
imputation  forx  q1}  with  respect  to  system  L, . 
And  then  correct  X  q2  1  by  the  imputed 
X  q1  ]  using  the  equality  edits  of  L2 .  If  X  j,1 }  is 
feasible  with  respect  to  Lx ,  butx  0  fails  L, ,  we 
only  need  to  correct  X  q2)  ,  again,  byx  q1}  using 
the  equality  edits  of  L2 ,  a  deterministic 
imputation. 

Benefits  in  computational  efficiency  for 
error  localization  can  be  significant  from 
application  of  the  elimination  methodology. 

In  processing  a  row  with  Chemikova’s 
algorithm,  excessive  growth  of  the  number  of 
columns  depends  on  the  number  of  fields, 
which  causes  the  storage  problem.  Reduction 
of  the  number  of  fields  reduces  the  magnitude 
of  computation.  Also,  the  computer  code  does 
not  need  to  handle  equality  edits,  which  also 
simplifies  the  computation. 

7.  IMPLEMENTATION 


L2 :  A  2x  =  bf2), 


where  X  = 


^x(1)^ 

y(2) 
vx  ; 


(n  x  1  ),x(1)  (n  -  q)  x  1 


consisting  of  the  fields  involved  in  the 


inequality  edits  in  L,  ,x  l2)  (q  X  1 )  consisting 
of  the  fields  eliminated  from  the  inequality 
edits;  A  j  m ,  X  ( n  -  q ) ,  A  2  ( q  X  ti  )  of  full 

rank, b ( 1  ]  ( m ,  X  1 ),  and bt2)  (q  X  1 ). 


Let  X  0  = 


fYdA 

A  o 


x(2) 

Vx o  J 


be  a  failed  record.  The 
correction  procedure  is:  if  the  subrecord  X 1' 1 


In  linear  editing,  elimination  of  fields  by 
equality  edits  restructures  the  linear  edit 
system.  This  restructuring  is  conducted  prior 
to  data  editing,  since  data  are  not  involved.  A 
separate  module  can  be  created  to  perform  the 
elimination. 

Generally,  when  q  (linearly  independent) 
equality  edits  are  present  in  the  linear  edit 
system,  any  subset  of  q  fields  may  be  selected 
for  elimination  from  the  inequality  edits, 
provided  the  elimination  process  is  valid 
according  to  Theorems  1  and  2.  That  is,  the 
q  variables  are  linearly  independent.  When 

performing  a  successive  elimination,  at  each 
stage,  there  is  no  additional  theoretical 
criterion  for  choosing  a  field  for  elimination. 
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besides  the  general  requirement  of  a  nonzero 
field. 

Practically,  some  strategies  may  be 
developed  for  choosing  the  fields  for 
elimination.  At  each  stage  of  elimination, 
maximizing  the  number  of  zeros  in  the 
coefficients  of  inequality  edits  appears  to  be  a 
practical  criterion.  Aggregate  variables  are 
natural  candidates  for  elimination.  Other 
strategies  may  be  developed  based  on  the 
structure  of  the  edit  system. 

In  computer  implementation  of  the 
elimination  process,  either  successive 
elimination  or  simultaneous  elimination  can 
be  performed,  as  illustrated  by  the  examples  in 
Section  5. 

8.  RECOMMENDATIONS 

The  edit  specifications  for  many  of  NASS’ 
surveys  include  a  substantial  number  of 
equality  edits.  Balance  edits  are  a  common 
example  of  this  type  of  edit.  The  Census  of 
Agriculture,  in  particular,  requires  a  very 
extensive  editing  system,  incorporating  many 
edits  of  this  type.  As  a  result,  the  variable 
elimination  methodology  provided  by  this 
paper  is  especially  useful  in  the  context  of 
researching  the  possible  incorporation 
of  error-localization  into  the  editing  system  for 
the  2007  U.S.  Census  of  Agriculture. 

The  implementation  of  this  methodology, 
in  conjunction  with  other  computational 
improvements,  may  enable  Fellegi-Holt 
methodology  to  be  implemented  into  the 
editing  systems  for  future  censuses  and  sample 
surveys  with  improved  efficiency  and 
accuracy. 

The  author  makes  the  following 
recommendations  for  further  research: 


1)  Develop  an  automated  approach  for 
implementing  the  proposed  variable 
elimination  process  from  an  initial  set  of 
linear  edits. 

2)  Calculate  the  computational  gains  from 
implementing  this  methodology  on  the 
linear  edits  prepared  specifically  for  the 
Census  of  Agriculture. 
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APPENDIX:  Proof  of  Lemmas  and 
Theorems 

Proof  of  Lemma  1 : 

The  lemma  is  clearly  true.  Since  we  can 
always  make  the  coefficient  of  the  generating 
field  in  the  equality  edit  to  be  opposite  in  sign 
to  that  in  the  other  edit,  the  lemma  is  thus  an 
immediate  consequence  of  Fellegi  and  Holt 
(1976)  Theorem  3. 

Proof  of  Lemma  2: 

The  set  of  edits  er  and  es  is  equivalent  to 
the  set  of  edits  et  and  es ,  since  edit  er  can  also 
be  generated  as  an  implied  edit  by  edits 
et  and  es .  Thus  we  may  use  the  set  of  edits 
et  and  es  to  replace  the  original  set  of  edits  er 
andes ;  or,  equivalently,  use  the  essentially 
new  implied  edit  et  to  replace  the  original 
inequality  ed\ter . 

Proof  of  Theorem  1 : 

Lete(,  i  =  1,2,...,  m  ,  be  the m  inequality 
edits  of  the  linear  edit  system,  and  J  ,  the  one 
equality  edit.  Denote  g  J  for  the  positivity 

constraint*^  >  0,  j  =  1,2,...,  n  . 

Suppose*^  is  a  nonzero  field  ofe  .  For 
each  inequality  edit  ei ,  for  which  xh  is  also  a 

nonzero  field,  by  Lemmas  1  and  2,  an 
essentially  new  implied  edit  can  be  generated 
frome(  andF  using  field  x  h  as  the  generating 

field,  and  replaces  e{  in  the  original  set  of 
edits.  Also,  an  essentially  new  implied  edit 
can  be  generated  from  g  h  ,  the  positivity 


constraint  for*A  ,  ande  using  xh  as  the 
generating  field,  and  replaces  g  h  .  The 

resulting  linear  edit  system  is  equivalent  to  the 
original  system.  The  new  system  contains 
m  +  1  inequality  edits  in  which  xh  is 

eliminated,  with  n  -  1  positivity  constraints 
g  • ,  j  ^  h  ,  and  the  original  equality  edit  T  . 

In  the  new  linear  system,  ~e  is  the  only  edit  that 
involves  xh  .  Formally,  xh  is  a  free  variable  in 

J  .  Thus,  the  extremal  points  of  the  new  linear 
system  can  be  obtained  by  first  obtaining  the 
extremal  points  in  *  . ,  j  &  h  ,  of  the  new 

linear  system  excluding  F ,  and  then 
determining  x  h  using  F .  These  extremal  points 
are  also  those  of  the  original  linear  system. 

The  proof  is  completed. 

Proof  of  Theorem  2: 

This  theorem  is  a  result  of  repeated 
application  of  the  elimination  method  given 
by  Theorem  1.  For  convenience,  the  proof  is 
made  for  <7  =  2  .  For  q  >2  ,  the  proof  can  be 

formally  given  by  induction,  and  is  omitted 
here. 

Denote  L  for  the  linear  edit  system  in  the 
theorem.  Let  et ,  i  =  1,2, . . . ,  m  ,  be  the  m 

inequality  edits,  Fk ,  k  =1,2  ,  the  two  equality 
edits.  Denote  g  .  for  the  positivity  constraint 
Xj  >0,j  =  1,2,...,  w  . 

Suppose  xh  is  a  nonzero  field  of  the 
equality  edit  ^  .  By  Theorem  1,  using  F,  we 
can  eliminate  field  jc  ^  from  all  other  edits  that 
involves xh  ,  including  them  inequality  edits, 
ei ,  i  =  1,2, ...,  m  ,  the  equality  edit e~2  ,  and 
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the  positivity  constraint  g  h  .  Denote  e\u  for  the 
resulting  m  inequality  edits  from 
et,  i  =  1,2,...,  m  ,F2(1)  for  that  from^2  ,  and 
for  that  from  g  h  .  Denote  L{  1 1  for  the 
resulting  linear  edit  system.  L(]  )  then  contains 
m  +  1  inequality  edits,  e{1} , 
i  =  1,2,...,  m  +  1  ,  which  involve  fields 
x  . ,  j  ^  h  ,  the  n  —  1  corresponding  positivity 

constraints,  g  . ,  j  ^  h  ,  and  the  two  equality 
edits,  F]  and  F-jn  (one  original  and  one 

generated).  L(1)  is  equivalent  to  L  .  This  is  the 
first  stage  of  elimination. 

Now  w'e  perform  the  second  stage  of 
elimination  with  respect  to  L(1) .  Let  J^.be  a 
nonzero  field  of  the  equality  edit 
F\ 1  ]  ( h '  ^  h  ,h'  exists  by  the  full  rank 
assumption  for  the  equality  edits).  We  use 
~e\ 1  ]  to  eliminate  field  x h.  from  all  other  edits, 
except  Fx ,  that  involves  xh.,  including  the 
m  +1  inequality  edits,  e,(1) , 

/  =  1,2, ...,  m  +  1 ,  and  the  positivity 
constraint  g  h.  for  xh..  Denote  ex  , 

i  =  1,2,...,  m  +  2  ,  for  the  resulting m  +  2 
inequality  edits. 

Denote  V1]  for  the  resulting  linear  edit 
system  from  the  second  stage  of  elimination. 

L  contains  m  +  2  inequality  edits, 
ej2\  i  =  1,2,...,  m  +  2  ,  involving  fields 
x  j ,  j  ^  h ,  h ' ,  the  n  —  2  corresponding 
positivity  constraints,  g  . ,  j  ±  h,h' ,  and  the 
two  equality  edits,  F,  and  F^'  (or  equivalently. 


the  original  set  of  equality  edits  e  j  and  e2  ). 

L{2)  is  equivalent  to  L(1) ,  and  hence  to  L  . 

We  thus  established,  for  q  =  2  ,  the 

structure  of  the  new  linear  system  through 
elimination,  as  stated  in  the  theorem.  The 
general  truth  of  the  theorem  for  any  q  can  be 

established  by  induction.  The  statement  in  the 
theorem  for  obtaining  the  extremal  points  of 
the  original  system  through  the  new  system  is 
an  immediate  consequence  of  the  structure  of 
the  new  linear  system.  The  proof  is  completed. 
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